Skip to content

Feat/voyageai: adding voyageai integration#4070

Open
fzowl wants to merge 9 commits intosimstudioai:mainfrom
fzowl:feat/voyageai-mongodb-atlas
Open

Feat/voyageai: adding voyageai integration#4070
fzowl wants to merge 9 commits intosimstudioai:mainfrom
fzowl:feat/voyageai-mongodb-atlas

Conversation

@fzowl
Copy link
Copy Markdown

@fzowl fzowl commented Apr 9, 2026

Summary

Brief description of what this PR does and why.

Fixes #(issue)

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

I added unit tests, integration tests and also tested manually.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

fzowl added 7 commits March 23, 2026 18:38
…nnection string support

- Add VoyageAI tools: embeddings (voyage-3, voyage-3-large, etc.) and rerank (rerank-2, rerank-2-lite)
- Add VoyageAI block with operation dropdown (Generate Embeddings / Rerank)
- Add VoyageAI icon and register in tool/block registries
- Enhance MongoDB with connection string mode for Atlas (mongodb+srv://) support
- Add connection mode toggle to MongoDB block (Host & Port / Connection String)
- Update all 6 MongoDB API routes to accept optional connectionString
- Add 48 unit tests (VoyageAI tools, block config, MongoDB utils)
…geAI and MongoDB

- Expand VoyageAI tool tests: metadata, all models, edge cases, error codes (60 tests)
- Expand VoyageAI block tests: structure, subBlocks, conditions, params edge cases (44 tests)
- Expand MongoDB utils tests: connection modes, URI building, all validators (56 tests)
- Add live integration tests: embeddings (7 models/scenarios), rerank (5 scenarios), e2e workflow
- Integration tests use undici to bypass global fetch mock
- Tests skip gracefully when VOYAGEAI_API_KEY env var is not set
- Add voyage-4-large, voyage-4, voyage-4-lite embedding models
- Add voyage-3.5, voyage-3.5-lite embedding models
- Add rerank-2.5, rerank-2.5-lite reranking models
- Default embeddings model: voyage-3.5
- Default rerank model: rerank-2.5
- All models verified working with live API
…tegration

- New tool: voyageai_multimodal_embeddings using voyage-multimodal-3.5 model
- New API route: /api/tools/voyageai/multimodal-embeddings for server-side file handling
- Supports text, image files/URLs, video files/URLs in a single embedding
- Uses file-upload subBlocks with basic/advanced mode for images and video
- Internal proxy pattern: downloads UserFiles via downloadFileFromStorage, converts to base64
- URL validation via validateUrlWithDNS for SSRF protection
- 14 new unit tests (tool metadata, body, response transform)
- 5 new integration tests (text-only, image URL, text+image, dimensions, auth)
- 8 new block tests (multimodal operation, params, subBlocks)
Remove non-TSDoc separator comments, fix relative import in barrel
export, fix any types, and apply biome formatting fixes.
Reverts MongoDB Atlas connection string support due to validation
issues in the Zod schemas. VoyageAI integration remains intact.
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 9, 2026

PR Summary

Medium Risk
Adds a new third-party AI integration plus an internal API route that downloads/encodes user-provided media and forwards requests to VoyageAI; failures could impact workflows and the proxy route increases exposure to input-validation/SSRF and payload-size edge cases.

Overview
Adds a new Voyage AI block with an operation selector to run text embeddings, multimodal embeddings (text + image/video via uploads or URLs), or rerank, mapping block inputs to the appropriate tool params.

Registers three new tools (voyageai_embeddings, voyageai_multimodal_embeddings, voyageai_rerank) and implements their request/response shaping, including a new internal Next.js route (/api/tools/voyageai/multimodal-embeddings) that authenticates internal calls, validates URLs, downloads uploaded media when needed, base64-encodes it, and forwards to VoyageAI’s multimodalembeddings API.

Adds a new VoyageAIIcon, extensive unit tests for the block/tool param mapping, and optional live integration tests gated by VOYAGEAI_API_KEY; also updates .gitignore to ignore .playwright-mcp/.

Reviewed by Cursor Bugbot for commit 6394764. Bugbot is set up for automated code reviews on this repo. Configure here.

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 9, 2026

@fzowl is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Apr 9, 2026

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@fzowl fzowl changed the title Feat/voyageai Feat/voyageai: adding voyageai integration Apr 9, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR adds a VoyageAI integration covering text embeddings, multimodal embeddings (images + video via an internal proxy route), and document reranking, with matching block, registry entries, and a comprehensive test suite.

Two issues need attention before merge:

  • canonicalParamId equals the subblock's own id for both imageFiles and videoFile subblocks in voyageai.ts, violating the documented critical constraint that could cause params to be dropped during canonical transformation.
  • transformResponse in embeddings.ts and rerank.ts does not check response.ok, so non-2xx API responses (e.g. 401, 429) produce a cryptic TypeError: Cannot read properties of undefined instead of surfacing the actual VoyageAI error message.

Confidence Score: 3/5

Two P1 issues — canonicalParamId constraint violation and missing response.ok guards — should be fixed before merging.

The canonicalParamId === id violation is a documented critical rule that may cause the canonical param transformation layer to drop file input values at runtime. The missing response.ok check means real API errors (invalid key, rate limit) produce opaque TypeErrors rather than actionable messages. Both are present on the changed code paths and need resolution.

apps/sim/blocks/blocks/voyageai.ts (canonicalParamId constraint), apps/sim/tools/voyageai/embeddings.ts and rerank.ts (response error handling)

Vulnerabilities

  • Image and video URLs are validated with validateUrlWithDNS before being passed to the VoyageAI API, preventing SSRF via crafted URLs.
  • The apiKey param correctly uses user-only visibility (not hidden) across all three tools, consistent with project policy.
  • Internal authentication is enforced via checkInternalAuth on the multimodal proxy route.
  • No secrets are logged; request IDs are used for traceability.
  • No other security concerns identified.

Important Files Changed

Filename Overview
apps/sim/blocks/blocks/voyageai.ts New VoyageAI block with embeddings, multimodal embeddings, and rerank operations; canonicalParamId equals the subblock id for imageFiles and videoFile, violating the documented constraint.
apps/sim/tools/voyageai/embeddings.ts Text embeddings tool; transformResponse does not check response.ok, so API errors produce a cryptic TypeError instead of the actual error message.
apps/sim/tools/voyageai/rerank.ts Rerank tool; same missing response.ok guard as embeddings.ts, plus truncation param declared in types but not used here.
apps/sim/tools/voyageai/types.ts Type definitions for all three operations; truncation is declared on both VoyageAIEmbeddingsParams and VoyageAIRerankParams but never wired into any tool or block.
apps/sim/app/api/tools/voyageai/multimodal-embeddings/route.ts Internal proxy route for multimodal embeddings; properly validates input with Zod, uses checkInternalAuth, validates URLs with DNS, and handles all file/URL content types correctly.
apps/sim/tools/voyageai/multimodal-embeddings.ts Multimodal embeddings tool; routes through the internal proxy (correct pattern for file handling), response transformation delegates to the route's structured output.
apps/sim/tools/voyageai/voyageai.test.ts Comprehensive unit tests for all three tools; uses vi.resetAllMocks() in afterEach which conflicts with the project's testing guidelines.
apps/sim/tools/voyageai/voyageai.integration.test.ts Integration tests skipped without VOYAGEAI_API_KEY; uses undici to bypass global fetch mock correctly, covers embeddings, rerank, and multimodal scenarios.
apps/sim/blocks/blocks/voyageai.test.ts Block-level unit tests with solid coverage of subBlock structure, tool routing, and params mapping; all assertions look correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[VoyageAI Block] --> B{Operation}
    B -->|embeddings| C[embeddingsTool]
    B -->|rerank| D[rerankTool]
    B -->|multimodal| E[multimodalEmbeddingsTool]
    C -->|POST direct| F[VoyageAI Embeddings API]
    D -->|POST direct| G[VoyageAI Rerank API]
    E -->|POST proxy| H[Internal Multimodal Route]
    H --> I{Content type}
    I -->|Text| J[content: text]
    I -->|imageFiles| K[base64 encode via storage]
    I -->|imageUrls| L[validateUrlWithDNS]
    I -->|videoFile| M[base64 encode via storage]
    I -->|videoUrl| N[validateUrlWithDNS]
    J & K & L & M & N --> O[VoyageAI Multimodal API]
    F & G & O --> P[embeddings / results / usage]
Loading

Reviews (1): Last reviewed commit: "revert: drop all MongoDB connection stri..." | Re-trigger Greptile

- Rename imageFiles/videoFile subblock IDs to avoid canonicalParamId collision
- Add response.ok guard in embeddings and rerank transformResponse
- Remove unused truncation param from types
- Fix test pattern: use beforeEach/clearAllMocks instead of afterEach/resetAllMocks
- Add Array.isArray guard for JSON.parse of imageUrls
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 80e0099. Configure here.

@fzowl fzowl force-pushed the feat/voyageai-mongodb-atlas branch from 4962ed6 to 6394764 Compare April 12, 2026 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant